# Low-latency Inference

Arch Router 1.5B.gguf
Other
Arch-Router is a preference-aligned routing framework model with 1.5B parameters, used to map queries to domain-operation preferences for model routing decisions.
Large Language Model Transformers English
A
katanemo
220
1
Dmind 1
MIT
DMind-1 is a Web3 expert model built upon Qwen3-32B, optimized for the Web3 ecosystem through supervised instruction fine-tuning and human feedback reinforcement learning, achieving significant improvements in task accuracy, content safety, and expert-level interaction alignment.
Large Language Model Transformers Supports Multiple Languages
D
DMindAI
129
21
Treehop Rag
MIT
TreeHop is a lightweight embedding-level framework designed for efficient query embedding generation and filtering in multi-hop QA, significantly reducing computational overhead.
Question Answering System
T
allen-li1231
36
3
Distil Large V3.5 Ct2
MIT
Distil-Whisper is a distilled version of the Whisper model, achieving efficient speech recognition through large-scale pseudo-labeling technology
Speech Recognition English
D
distil-whisper
264
3
Canary 180m Flash
NVIDIA NeMo Canary Flash is a multilingual multitask speech model supporting automatic speech recognition and translation tasks in English, German, French, and Spanish.
Speech Recognition Supports Multiple Languages
C
nvidia
15.17k
60
Qwen2.5 VL 3B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.
Text-to-Image Transformers English
Q
RedHatAI
112
1
Mistral Small 24B Instruct 2501 AWQ
Apache-2.0
Mistral Small 3 (version 2501) is a 24B-parameter instruction-tuned large language model that sets a new benchmark in the sub-70B parameter category, featuring exceptional knowledge density and multilingual support capabilities.
Large Language Model Transformers Supports Multiple Languages
M
stelterlab
52.55k
18
Yolo11n Cs2
A lightweight Counter-Strike 2 player detection model based on YOLOv11, suitable for real-time object detection scenarios
Object Detection
Y
Vombit
22
1
Mxbai Rerank Base V1
Apache-2.0
This is a Transformer-based Reranker model primarily used for information retrieval and search result optimization tasks.
Transformers English
M
khoj-ai
81
1
Ja Cascaded S2t Translation
Apache-2.0
This is a Japanese speech-to-any-target-language text translation pipeline based on a cascaded approach, consisting of automatic speech recognition (ASR) and text translation components.
Speech Recognition Transformers
J
japanese-asr
60
4
Kotoba Whisper V2.1
Apache-2.0
Kotoba-Whisper-v2.1 is a Japanese automatic speech recognition (ASR) model based on Whisper, integrating an additional post-processing stack that automatically adds punctuation marks.
Speech Recognition Transformers Japanese
K
kotoba-tech
2,589
16
Vits Ar Sa A
This is a Transformers-based Text-to-Speech (TTS) model capable of converting input text into natural speech output.
Speech Synthesis Transformers
V
wasmdashai
227
2
Mobileclip S1 OpenCLIP
MobileCLIP-S1 is an efficient image-text model that achieves fast zero-shot image classification through multimodal reinforcement training.
Image-to-Text
M
apple
7,723
10
Sew Ft Fake Detection
Apache-2.0
This model is a fine-tuned audio classification model based on asapp/sew-mid-100k on the alexandreacff/kaggle-fake-detection dataset, designed for fake audio detection.
Audio Classification Transformers Other
S
alexandreacff
58
0
Yolov9c Cs2
Counter-Strike 2 (CS2) player object detection model based on YOLOv9 architecture, capable of recognizing player characters in the game
Object Detection
Y
Vombit
16
2
Vitpose Base Simple
This is a keypoint detection model based on transformers, used to identify keypoint positions in images
Pose Estimation Transformers
V
nielsr
109
1
Mixtral 8x22B Instruct V0.1
Apache-2.0
Mixtral-8x22B-Instruct-v0.1 is a large language model fine-tuned for instructions based on Mixtral-8x22B-v0.1, supporting multiple languages and function calling capabilities.
Large Language Model Transformers Supports Multiple Languages
M
mistralai
12.80k
723
Faster Whisper Large V3 Ja
MIT
Japanese-optimized version based on OpenAI Whisper large-v3, supporting multilingual speech recognition
Speech Recognition Supports Multiple Languages
F
JhonVanced
46
3
Faster Whisper Large V2
MIT
Whisper large-v2 is a large-scale automatic speech recognition (ASR) model developed by OpenAI, supporting multilingual speech-to-text tasks.
Speech Recognition Supports Multiple Languages
F
Systran
948.29k
34
Juice Wrld
This is an audio conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into speech with a specific style.
Speech Synthesis Transformers
J
sail-rvc
4,527
0
Anya
This is an RVC (Retrieval-Based Voice Conversion) model designed for audio-to-audio conversion tasks.
Speech Synthesis Transformers
A
sail-rvc
238
0
Sonic48k
Sonic48k is an audio-to-audio model based on RVC (Retrieval-based Voice Conversion) technology, primarily used for voice conversion tasks.
Speech Synthesis Transformers
S
sail-rvc
25
1
Sasukeuchiha
This is an audio conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of converting input audio into a specific character's voice.
Speech Synthesis Transformers
S
sail-rvc
848
0
Mileycyrus2333333
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting input audio into a specific style of speech.
Speech Synthesis Transformers
M
sail-rvc
30
0
Legocitynarrator
This is an RVC (Retrieval-Based Voice Conversion) model designed for audio-to-audio tasks, particularly suited for converting speech into the Lego City narrator style.
Speech Synthesis Transformers
L
sail-rvc
291
2
Jesse Pinkman
This is an audio conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting input audio into the voice of Jesse Pinkman.
Speech Synthesis Transformers
J
sail-rvc
2,697
0
Arthurmorgan
This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of converting input audio into a specific style of voice output.
Speech Synthesis Transformers
A
sail-rvc
2,916
2
Faster Whisper Large V2 Japanese 5k Steps
MIT
A Japanese automatic speech recognition (ASR) model based on Whisper Large V2, optimized with CTranslate2 for efficient inference.
Speech Recognition Transformers Japanese
F
zh-plus
280
18
Extractive Question Answering Not Evaluated
Apache-2.0
This model is a DistilBERT model fine-tuned on the SQuAD dataset for extractive question answering tasks, with high exact match rate and F1 score.
Question Answering System Transformers
E
autoevaluate
18
2
Levit 256
Apache-2.0
LeViT-256 is an efficient vision model based on Transformer architecture, designed for fast inference and pretrained on the ImageNet-1k dataset.
Image Classification Transformers
L
facebook
37
0
Mobilevit Small
Other
MobileViT is a lightweight, low-latency vision Transformer model that combines the strengths of CNNs and Transformers, making it suitable for mobile devices.
Image Classification Transformers
M
apple
894.23k
65
Mobilevit Small
Other
MobileViT is a lightweight, low-latency vision Transformer model that combines the advantages of CNNs and Transformers, suitable for mobile devices.
Image Classification Transformers
M
Matthijs
39
0
Sbert Chinese Qmc Finance V1 Distill
A lightweight sentence similarity model optimized for financial domain question matching, compressing 12-layer BERT to 4 layers through distillation technology, significantly improving inference efficiency
Text Embedding Transformers
S
DMetaSoul
20
3
Ms Marco TinyBERT L6
Apache-2.0
A cross-encoder model trained on the MS Marco passage ranking task, suitable for query-passage relevance scoring in information retrieval scenarios.
Text Embedding English
M
cross-encoder
6,963
1
BERT NLP
A versatile large language model capable of handling various natural language processing tasks (inferred information)
Large Language Model
B
subbareddyiiit
18
0
Distilbart Xsum 12 3
Apache-2.0
DistilBART is a distilled version of the BART model, specifically optimized for summarization tasks, significantly reducing model parameters and inference time while maintaining high performance.
Text Generation English
D
sshleifer
579
11
Klue Bert Base Mrc
A Korean extractive Q&A model based on KLUE-BERT-base, specifically designed for Korean machine reading comprehension tasks.
Question Answering System Transformers Korean
K
ainize
120
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase